[MS Info Sharing] Exchange 2007 DB resources fail to shutdown before timeout (SCC), Event 1115, 482, 414, 492
Title: Exchange 2007 DB resources fail to shutdown before timeout (SCC), Event 1115, 482, 414, 492 Intended Audience : IT Pro Applies to Products: Microsoft Exchange Server 2007 Standard Edition Microsoft Exchange Server 2007 Enterprise Edition Microsoft Exchange Server 2007 Service Pack 1 Source SR/Case Number: N/A FAST PUBLISHING Symptom When attempting to Take Offline or, Move-CMS an Exchange 2007 Single Copy Cluster (SCC) which is under a heavy load, it is possible that any database that has not dismounted can shutdown (uncleanly). The following can be seen in the cluster.log: Attempting to take the DB offline: 00000d18.00001d60::2006/12/08-09:00:45.839 INFO Microsoft Exchange Database Instance <SG1/SG1DB1 (EVS1)>: [EXRES] calling EcUnmountDatabase() The failure event after the timeout has been exceeded: 00000d18.00001b1c::2006/12/08-09:01:25.828 ERR Microsoft Exchange Database Instance <SG1/SG1DB1 (EVS1)>: [EXRES] EventLogging: Clustered Mailbox Server: EVS1 Physical Server: <ServerName>Failed to bring the resource SG1/SG1DB1 (EVS1) offline due to a timeout. Error Code: 1460. 2006/12/08-09:01:25.828 WARN Microsoft Exchange Database Instance <SG1/SG1DB1 (EVS1)>: [EXRES] State change: from 130 (OfflinePending) to 4 (Failed). You may also see the following events in the application on the passive node: Event Type: Warning Event Source: MSExchangeIS Mailbox Store Event Category: General Event ID: 1115 Description: Error 0xfffffbbe returned from closing database table, called from function JTAB_BASE::EcCloseTable on table Folders. You will also see ESE errors indicating commit failures Event Type: Error Event Source: ESE Event Category: General Event ID: 104 Description: MSExchangeIS (4284) <SG>: The database engine stopped the instance (4) with error (-1090). Event Type: Error Event Source: ESE Event Category: General Event ID: 482 Description: MSExchangeIS (4284) SG06: An attempt to write to the file < log file> at offset 457216 (0x000000000006fa00) for 512 (0x00000200) bytes failed after 0 seconds with system error 21 (0x00000015): "The device is not ready. ". The write operation will fail with error -1022 (0xfffffc02). If this error persists then the file may be damaged and may need to be restored from a previous backup. Event Type: Error Event Source: ESE Event Category: Logging/Recovery Event ID: 414 Description: MSExchangeIS (4284) SG06: Unable to write to section 0 while flushing logfile <log file> Error -1022 (0xfffffc02). Event Type: Error Event Source: ESE Event Category: Logging/Recovery Event ID: 492 Description: MSExchangeIS (4284) <SG>: The logfile sequence in <log file> has been halted due to a fatal error. No further updates are possible for the databases that use this logfile sequence. Please correct the problem and restart or restore from backup. Event Type: Error Event Source: ESE Event Category: Logging/Recovery Event ID: 471 Description: MSExchangeIS (4284) <SG>: Unable to rollback operation #141661 on database <Exchange Database>. Error: -510. All future database updates will be rejected. Cause This issue occurs because once the database resources reach the timeout value they go into a failed state. Once this occurs the waiting dependent resources such as physical disk are brought offline by cluster and the errors are seen. Resolution It is possible to eliminate these events by increasing the timeout on the database resources. Increasing the timeout can give the DBs more time to go down cleanly, avoiding the ESE errors. The default timeout is 180 seconds. Increasing the timeout, does increase the time it takes to failover. The timeout will typically be customer specific. Typically for most clusters a timeout value of 5 minutes (300 seconds) should alleviate the problem. The value is set with the following command: cluster res "<Database Resource Name> /prop PendingTimout=300000 This can also be set in Cluster Administrator on the properties of the database resource(s) on the advanced tab, changing pending timeout to the desired value. More Information Note: After applying SP1 there have been some improvements flushing the cache but customers with large SCC clusters can still hit this issue. To see the pending timeout on a database resource you can run the following:Cluster /res <SGName/DBName (CMSName) > /prop For Example:C:\>cluster res "First Storage Group/Mailbox Database (SCC-Mail1)" /propListing properties for 'First Storage Group/Mailbox Database (SCC-Mail1)':T Resource Name Value-- -------------------- ------------------------------ -----------------------SR First Storage Group/Mailbox Database (SCC-Mail1) NameFirst Storage Group/Mailbox Database (SCC-Mail1)S First Storage Group/Mailbox Database (SCC-Mail1) TypeMicrosoft Exchange Database InstanceS First Storage Group/Mailbox Database (SCC-Mail1) DescriptionS First Storage Group/Mailbox Database (SCC-Mail1) DebugPrefixD First Storage Group/Mailbox Database (SCC-Mail1) SeparateMonitor0 (0x0)D First Storage Group/Mailbox Database (SCC-Mail1) PersistentState1 (0x1)D First Storage Group/Mailbox Database (SCC-Mail1) LooksAlivePollInterval4294967295 (0xffffffff)D First Storage Group/Mailbox Database (SCC-Mail1) IsAlivePollInterval4294967295 (0xffffffff)D First Storage Group/Mailbox Database (SCC-Mail1) RestartAction1 (0x1)D First Storage Group/Mailbox Database (SCC-Mail1) RestartThreshold1 (0x1)D First Storage Group/Mailbox Database (SCC-Mail1) RestartPeriod900000 (0xdbba0)D First Storage Group/Mailbox Database (SCC-Mail1) RetryPeriodOnFailure4294967295 (0xffffffff)D First Storage Group/Mailbox Database (SCC-Mail1) PendingTimeout180000 (0x2bf20)D First Storage Group/Mailbox Database (SCC-Mail1) LoadBalStartupInterval300000 (0x493e0)D First Storage Group/Mailbox Database (SCC-Mail1) LoadBalSampleInterval10000 (0x2710)D First Storage Group/Mailbox Database (SCC-Mail1) LoadBalAnalysisInterval300000 (0x493e0)D First Storage Group/Mailbox Database (SCC-Mail1) LoadBalMinProcessorUnits0 (0x0)D First Storage Group/Mailbox Database (SCC-Mail1) LoadBalMinMemoryUnits0 (0x0) Note: The default timeout is 180 seconds. FAST PUBLISHING ARTICLES PROVIDE INFORMATION DIRECTLY FROM WITHIN THE MICROSOFT SUPPORT ORGANIZATION. THE INFORMATION CONTAINED HEREIN IS CREATED IN RESPONSE TO EMERGING OR UNIQUE TOPICS, OR IS INTENDED SUPPLEMENT OTHER KNOWLEDGE BASE INFORMATION. DISCLAIMER MICROSOFT AND/OR ITS SUPPLIERS MAKE NO REPRESENTATIONS OR WARRANTIES ABOUT THE SUITABILITY, RELIABILITY OR ACCURACY OF THE INFORMATION CONTAINED IN THE DOCUMENTS AND RELATED GRAPHICS PUBLISHED ON THIS WEBSITE (THE MATERIALS) FOR ANY PURPOSE. THE MATERIALS MAY INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS AND MAY BE REVISED AT ANY TIME WITHOUT NOTICE. TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, MICROSOFT AND/OR ITS SUPPLIERS DISCLAIM AND EXCLUDE ALL REPRESENTATIONS, WARRANTIES, AND CONDITIONS WHETHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING BUT NOT LIMITED TO REPRESENTATIONS, WARRANTIES, OR CONDITIONS OF TITLE, NON INFRINGEMENT, SATISFACTORY CONDITION OR QUALITY, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WITH RESPECT TO THE MATERIALS. Keywords: kbnoloc kbnomt kbrapidpub Jeff Feng - MSFT
April 28th, 2009 10:30am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics